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Abstract 

Debugging of ontologies is an important prerequisite for their wide-spread application, especially in areas that rely 
upon everyday users to create and maintain knowledge bases, as in the case of the Semantic Web. Recent approaches 
use diagnosis methods to identify causes of inconsistent or incoherent ontologies. However, in most debugging 
scenarios these methods return many alternative diagnoses, thus placing the burden of fault localization on the user. 
This paper demonstrates how the target diagnosis can be identified by performing a sequence of observations, that is, 
by querying an oracle about entailments of the target ontology. We exploit a-priori probabilities of typical user errors 
to formulate information-theoretic concepts for query selection. Our evaluation showed that the proposed method 
significantly reduces the number of required queries compared to myopic strategies. We experimented with different 
probability distributions of user errors and different qualities of the a-priori probabilities. Our measurements showed 
the advantageousness of information-theoretic approach to query selection even in cases where only a rough estimate 
of the priors is available. 
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1. Introduction 

Acquisition and maintenance of knowledge bases is 
an important prerequisite for a successful application of 
semantic systems in areas such as the Semantic Web. 
At the current state of the art ontology extraction meth- 
ods do not allow a complete and error free automatic 
acquisition of ontologies. Thus users of semantic sys- 
tems are required to formulate and correct logical de- 
scriptions on their own. In most of the cases these 
users are domain experts who have little or no expe- 
rience in expressing their knowledge in representation 
languages like OWL HI. Studies in cognitive psychol- 
ogy, e.g. f2','3l, discovered that humans make systematic 
errors while formulating or interpreting logical descrip- 
tions. Results presented in ||4l|5l confirmed these obser- 
vations regarding ontology development. Therefore it is 
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essential to create methods that can identify and correct 
erroneous ontological definitions. 

Ontology debugging methods |l6l|2l|8]|9l simplify the 
development of ontologies. Usually the main require- 
ment for the debugging process is to obtain a consistent 
and, optionally, coherent ontology. These basic require- 
ments can be extended by additional ones, such as test 
cases IS], which must be fulfilled by the target ontology 
Ot- Given the requirements (e.g. formulated by a user) 
an ontology debugger identifies a set of alternative di- 
agnoses, where each diagnosis corresponds to a set of 
possibly faulty axioms. In particular, a diagnosis D is 
a subset of an ontology O such that removal of the di- 
agnosis from the ontology (i.e. O \ D) will allow the 
formulation of the target ontology O, that fulfills all the 
requirements. We call the removal of a diagnosis from 
the ontology a trivial application of a diagnosis. More- 
over, in practical applications it might be inefficient to 
consider all possible diagnoses. Therefore, modern on- 
tology debugging approaches focus on the computation 
of minimal diagnoses, i.e. such diagnoses £),■ that no 
D' c £), is a diagnosis. A user has to change at least all 
of the axioms of a minimal diagnosis in order to formu- 
late the intended target ontology. 

However, the diagnosis methods can return many al- 
ternative minimal diagnoses for a given set of test cases 



Preprint submitted to Web Semantics: Science, Services and Agents on the World Wide Web 



July 22, 2011 



and requirements. A sample study of real- world inco- 
herent ontologies, which were used in [7|, shows that 
there may exist hundreds or even thousands of minimal 
diagnoses. In the case of the Transportation ontology 
the diagnosis method was able to identify 1782 minimal 
diagnoses pi In such situations some simple visualiza- 
tion of all alternative modifications of the ontology is 
ineffective. The goal of sequential debugging is to iden- 
tify the set of axioms D, of an ontology which have to 
be changed or removed in order to formulate the target 
ontology Ot- The set of axioms D, is called the target di- 
agnosis. Consequently, the target ontology corresponds 
to the ontology resulting from a removal of the target 
diagnosis from the original ontology and an extension 
by some additional axioms EX, i.e. O, - {0\ D,) U EX. 

A possible solution of the problem would be to intro- 
duce an order on the set of diagnoses by means of some 
preference criteria. For instance, Kalyanpur et al. I.10J 
suggest measures to rank the axioms of a diagnosis de- 
pending on their structure, occurrence in test cases, etc. 
Only the top ranking diagnoses are then presented to 
the user. Of course this set of diagnoses will contain the 
target diagnosis only in the case when a faulty ontology, 
the given requirements and test cases, provide sufficient 
data to the appropriate heuristic. Therefore, in most de- 
bugging sessions a user has to input additional informa- 
tion (e.g. tests in form of required implications of facts 
or axioms) to identify the target diagnosis. However, it 
is hard to guess, which information is required. That is, 
a user does not know a priori which and how many tests 
should be provided to the debugger, such that it will re- 
turn the target diagnosis. 

In this paper we present an approach for the acquisi- 
tion of additional information by generating a sequence 
of queries, which should be answered by some oracle 
such as a user, an information extraction system, etc. 
Each answer to a query is used by our method to re- 
duce the set of diagnoses until, finally, the target di- 
agnosis is identified. In order to construct queries we 
exploit the property that different ontologies resulting 
from trivial applications of different diagnoses entail un- 
equal sets of axioms. Consequently, we can differentiate 
between diagnoses by asking the oracle if the target on- 
tology should imply a logical sentence or not. These 
implied logical sentences can be generated by classifi- 
cation and reaUzation services provided in description 
logic reasoning systems nTl [T2l[T3l . In particular, the 
classification process computes a subsumption hierar- 
chy (sometimes also called "inheritance hierarchy" of 
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parents and children) for each concept name mentioned 
in a TBox. For each individual mentioned in an ABox, 
the realization computes the atomic classes (or concept 
names) of which the individual is an instance IfTTII . 

In order to generate the most informative query we 
exploit the fact that some diagnoses are more likely than 
others because of typical user errors SIS]. User's be- 
liefs for an error to occur in some part of a knowledge 
base, represented as probabilities, can be used to esti- 
mate the change in entropy of the set of diagnoses if a 
particular query is answered. We select those queries 
which minimize the expected entropy, i.e. maximize 
the information gain. An oracle should answer these 
queries until a diagnosis is identified whose probability 
is significantly higher than those of all other diagnoses. 
This diagnosis is the most likely to be the target one. 

We compare our entropy-based method with a greedy 
approach that selects those queries which try to cut the 
number of diagnoses in half. The evaluation is per- 
formed using generated examples as well as real-world 
ontologies presented in Table [8] In the first case we al- 
ter a consistent and coherent ontology with additional 
axioms to generate such conflicts that result in a prede- 
fined number of diagnoses of required length. A faulty 
ontology is then analyzed by the debugging algorithm 
using entropy, greedy and "random" strategies, where 
the latter selects queries to be asked completely ran- 
domly. Evaluation results show that on average the 
suggested entropy-based approach is almost 50% bet- 
ter than the greedy one. In the second evaluation sce- 
nario we analyzed the performance of entropy-based 
and greedy strategies on real-world ontologies given 
different input settings. In particular, we simulated dif- 
ferent strategies for a user to assign prior probabilities 
as well as the quality of these probabilities that might 
occur in practice. The obtained results show that the 
entropy method outperformed the greedy heuristic in 
most of the cases. In some situations the entropy-based 
approach achieved twice as good average performance 
compared to the greedy one. Moreover, the evaluation 
on the real-world ontologies showed that the entropy- 
based query selection is robust to the actual values of 
prior fault probabilities as well as differences between 
them. It is only important whether the specified priors 
favor the target diagnosis or not. 

The remainder of the paper is organized as follows: 
Section l2] presents two introductory examples as well 
as the basic concepts. The details of the entropy -based 
query selection method are given in SectionIS] Sectionffl 
describes the implementation of the approach and is fol- 
lowed by evaluation results in Section|5] The paper con- 
cludes with an overview on related work. 



2. Motivating examples and basic concepts 

First, we present the fundamental concepts regarding 
the diagnosis of ontologies and eventually show how 
queries and answers can be generated and employed to 
differentiate between sets of diagnoses. 

2.1. Diagnosis of ontologies 

Example 1. Consider a simple ontology O with the ter- 
minology T: 

ax\ : A C B axi : B ^ C 
axi : C ^ D 0x4 : D ^ R 

and assertions J{ : {A(w), -i/?(w), A(v)). 

Let the user explicitly define that the three assertional 
axioms should be considered as correct, i.e. these ax- 
ioms are added to a background theory S. The introduc- 
tion of a background theory keeps the diagnosis method 
focused on the possibly faulty axioms. 

Assume that the user requires the ontology O to be 
consistent, whereas O is inconsistent. The only irre- 
ducible set of axioms (minimal conflict set) that pre- 
serves the inconsistency is CS : {{ax],ax2,ax^,ax4)]. 
That is one has to modify or remove the axioms of at 
least one of the foUowing diagnoses 

Di : [axi] D2 '■ [ax2] D3 : [axj,] D4 : [ax^] 

to restore the consistency of the ontology. However, it is 
unclear, which diagnosis from the set D : {Di, . . . , D4} 
corresponds to the target one. 

The target diagnosis can be identified by the debugger 
given a set of axioms P that must be entailed by the 
target ontology and a set of axioms A^ that must not: 

1. 0,\^p^peP 

2. OtV^n'^neN 

For instance, if the user provides the information that 
Ot 1= B{w) and O, ^ C(w) then the debugger will re- 
turn only one diagnosis in our example, namely £)2. 
Application of this diagnosis results in a satisfiable on- 
tology O2 - O \ D2 that entails B{w) because of axi 
and the assertion A(w). In addition, O2 does not en- 
tail Ciw) since O2 n -iC(w) is satisfiable and, more- 
over, -^R{w) n ax4 n flX3 1= -^C{w). All other ontologies 
Oi - (O \ Di) obtained by the application of the di- 
agnoses Di , D3 and D4 do not fulfill the given require- 
ments, since Oi UB(w) is unsatisfiable and therefore any 
satisfiable extension of Oi cannot entail B(w) and both 
Oi and O4 entail C{w). Therefore, O2 corresponds to 
the target diagnosis O,. 



Note that the approach presented in this paper can 
also be used with knowledge representation languages 
without negation like OWL 2 EL if an underlying rea- 
soner supports both consistency and entailment check- 
ing. 

Definition 1. Given a diagnosis problem instance 
{O, S, P, N) where O is an ontology, S a background 
theory, P a set of logical sentences which must be im- 
plied by the target ontology O,, and N a set of logical 
sentences which must not be implied by Of. 

A diagnosis is a set of axioms X> Q O iff the set of 
axioms 0\D can be extended by a logical description 
EX such that: 

1. {O \ D) U S U EX is consistent (and coherent if 
required by a user) 

2. {0\D)VJSvjEX\^ pfor allpeP 

3. (0\D)USUEX^ nfor all n e N 

Following the standard definition of diagnosis fT4l 
[T5I . it is assumed that each axiom axj e £), is faulty, 
whereas each axiom axi^ e 0\Di is correct. 

If D, is the set of axioms of O to be changed (i.e. 
D, is the target diagnosis) then the target ontology O, is 
(0\D,)\JSLIEX for some EX defined by the user. 

Definition 2. A diagnosis D for a diagnosis problem 
instance {O, S, P, N) is a minimal diagnosis iff there is 
no proper subset of the faulty axioms D' <Z D such that 
!D' is a diagnosis. 

Definition 3. A diagnosis Dfor a diagnosis problem in- 
stance {O, S, P, N) is a minimum cardinality diagnosis 
iff there is no diagnosis D' <Z D such that \D'\ < \D\. 

The extension EX plays an important role in the re- 
pair process of an ontology. A diagnosis suggests only 
some set of axioms, which have to be removed from 
an ontology by the user, but it does not make any sug- 
gestion on axioms that have to be added to the ontol- 
ogy. For instance, given our example ontology O, the 
user requires that the target ontology must not entail 
B{w) but has to entail B(v), that is A^ = {B(w)} and 
P = {B(v)}. Because, the example ontology is incon- 
sistent some sentences must be changed. The consistent 
ontology O] - 0\T)\, neither entails B(v) nor B(w) (in 
particular <9i |= -iB(w)). Consequently, 0\ has to be ex- 
tended with some set EX of logical sentences in order to 
entail Biy). This set of logical sentences can be simply 
approximated with EX - [Biy)]. 0\ U EX is satisfiable, 
entails B{v) but does not entail B{w). 

AH other ontologies O, - 0\ D/, i - 2,3,4 are 
consistent but entail both B{w) and B(v) and must be 



rejected because of the monotonic semantic of descrip- 
tion logic. That is, there is no such extension EX that 
{Oi U EX) y^ B{w). Therefore, the diagnosis Di is the 
minimum cardinality diagnosis which allows the formu- 
lation of the target ontology with changing a minimal 
number of axioms. 

The following proposition characterizes diagnoses 
without the true extension EX employed to formulate 
the target ontology. The idea is to use the sentences 
which must be entailed by the target ontology to approx- 
imate EX as it is shown above. 

Corollary 1. Given a diagnosis problem {O, !B, P, N), a 

set of axioms D Q O is a diagnosis iff 

(0\D)USU{f\p} 

peP 
is satisfiable (coherent) and 

MneN : {0\ D)U SU {/\p} ^ n 

peP 

In the following we assume that a diagnosis always 
exists. A diagnosis exists iff the background theory to- 
gether with the axioms in P are consistent (coherent) 
and no axiom in A^ is entailed, i.e. 

Proposition 1. A diagnosis Dfor a diagnosis problem 
{0,!B,PN} exists iff 



SU{/\p] 



peP 
is consistent (coherent) and 

peP 

For the computation of diagnoses conflict sets are 
usually employed to constrain the search space. A con- 
flict set is the part of the ontology that preserves the 
inconsistency/incoherency. 

Definition 4. Given a diagnosis problem instance 
{O, S, P, N), a set of axioms CS C O is a conflict set 
iff CS U S U {/\pepp} is inconsistent (incoherent) or 
there is an n e N s.t. CS U S U { Anep P) N «• 

Definition 5. A conflict set CS for an instance 
{O, S, P, N) is minimal iff there is no proper subset 
CS ' c CS such that CS ' is a conflict. 

A set of minimal conflict sets can be used to compute 
the set of minimal diagnoses as it is shown in fT4l|. The 
idea is that each diagnosis should include at least one 
element of each minimal conflict set. 
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O3 
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O4 


{B(w),C(w),D{w)} 



Table 1 : Entailments of ontologies (9, 
ExamplefTlretumed by realization. 
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Proposition 2. D is a diagnosis for the diagnosis prob- 
lem instance (O, S, P, N) iff T) is a minimal hitting set 
for the set of all minimal conflict sets of the instance. 

Most of the modem ontology diagnosis methods 161 
|7][8l|9J are implemented according to Proposition l2] and 
differ in details, e.g. how and when (minimal) conflict 
sets are computed, the order in which hitting sets are 
generated, etc. 

2.2. Differentiating between diagnoses 

The diagnosis method usually generates a set of diag- 
noses for a given diagnosis problem instance. Thus, in 
Example [T] an ontology debugger returns four minimal 
diagnoses {Di ... ©4). As it is shown in the previous 
section, additional information, i.e. sets of logical sen- 
tences P and A^, can be used by the debugger to reduce 
the set of diagnoses. However, in the general case the 
user does not know which sets P, N of logical sentences 
should be provided to the debugger s.t. the target diag- 
nosis is identified. Therefore, the debugger should be 
able to identify sets of logical sentences on its own and 
only ask the user or some other oracle, whether these 
sentences must or must not be entailed by the target on- 
tology. To generate these sentences the debugger can 
apply each of the diagnoses D - {Di . . . D„] and obtain 
a set of ontologies O, - 0\ £), that fulfill the user re- 
quirements. For every ontology (9, a description logic 
reasoner can generate a set of entailments such as en- 
tailed subsumptions provided by the classification ser- 
vice and sets of class assertions provided by the realiza- 
tion. In fact, the intention of the classification is that a 
model for a specific application domain can be verified 
by exploiting the subsumption hierarchy fTSl. These 
entailments can be used to discriminate between the di- 
agnoses, as different ontologies are likely to entail dif- 
ferent sets of sentences. In the following we consider 
only two types of entailments that can be computed by 
a description logic reasoner, namely subsumptions and 
class assertions. In general, the approach presented in 
this paper is not limited to these types and can use all 
possible entailment types supported by a reasoner. 



For instance, in Example [T] for each ontology Oi - 
(O \ £),) , i - I . . A the realization service of a reasoner 
returns the set of class assertions presented in Table [T] 
Without any additional information the debugger cannot 
decide which of these sentences must be entailed by the 
target ontology. To get this information the diagnosis 
method should be able to access some oracle that can 
answer whether the target ontology entails some set of 
sentences or not. E.g. the debugger asks an oracle if 
D(w) is entailed by the target ontology (O, |= D(w)). If 
the answer is yes, then D(w) is added to P and D4 is 
considered as the target diagnosis. All other diagnoses 
are rejected because (O \ D,) U S U {D(w)} for / = 1 , 2, 3 
is inconsistent. If the answer is no, then D{w) is added 
to A^ and D4 is rejected as (O \ D4) U S |= D(w) and we 
have to ask the oracle another question. 

Property 1. Given a diagnosis problem {0,!B,P,N}, a 
set of diagnoses D, and a set of logical sentences Q rep- 
resenting the query Oi \= Q : 

If the oracle gives the answer yes then every diagno- 
sis Di e J) is a diagnosis for P U Q iff both conditions 
hold: 



• Di G D® ifDi i (d^ U D^^) 



(O \ Di) U S U { A p) U g is consistent (coherent) 

peP 

\/neN : (0\Di)U SU {/\p} U Q^ n 



peP 

If the oracle gives the answer no then every diagnosis 
Di € D is a diagnosis for NU Q iff both conditions hold: 



(O \Di)LlSLI{/\p} is consistent (coherent) 

peP 

\/ne(NUQ) : {0\m ^ SU {/\p} V^ n 

peP 

In particular, a query partitions the set of diagnoses D 
into three mutual disjoint subsets. 

Definition 6. For a query Q each diagnosis 25, e D of a 
diagnosis problem instance (O, S, P, N) can be assigned 
to one of the three sets D^, D^ orD" where 

• Di e D'' if it holds that 

(0\Di)USU{/\p}^Q 

peP 

• Di e D'^ if it holds that 

(0\Di)USU{/\p}UQ 



PEP 



is inconsistent (incoherent). 



Given a diagnosis problem instance we say that the 
diagnoses in D^ predict a positive answer (yes) as a re- 
sult of the query Q, diagnoses in D'^ predict a negative 
answer (no), and diagnoses in D® do not make any pre- 
dictions. 

Property 2. Given a diagnosis problem instance 
(O, S, P, N), a set of diagnoses D, and a query Q: 

If the oracle gives the answer yes then the set of re- 
jected diagnoses is YiP and the set of remaining diag- 
noses is Hf U \f. 

If the oracle gives the answer no then the set of re- 
jected diagnoses is D*" and the set of remaining diag- 
noses is Y)^ U D". 

Consequently, given a query Q either D'' or D'^ are 
eliminated but D" always remains after the query is an- 
swered. For generating queries we have to investigate 
for which subsets D^,D^ c D a query exists that can 
differentiate between these sets. A straight forward ap- 
proach for query generation is to investigate all possible 
subsets of D. This is feasible if we limit the number n of 
minimal diagnoses to be considered during query gen- 
eration and selection. E.g. for n = 9 in the worst case 
the algorithm has to verify 512 possible partitions. 

Given a set of diagnoses D for the ontology O, a set P 
of sentences that must be entailed by the target ontology 
Ot and a set of background axioms S, the set of parti- 
tions PR for which a query exists can be computed as 
follows: 

1. Generate the power set V (D), PR <- 

2. Assign to the set D? an element of V (D) and gen- 
erate a set of common entailments £, of all ontolo- 
gies O \ Dj, where Dj e Df 

3. If Ei - then reject the current element, remove it 
from !P (D) <- !P (D) \ Df and goto Step 2. Other- 
wise set Qi «— Ei. 

4. Use Definition l6] and the query Qi to classify the 
diagnoses Di, e D \ Df into the sets D?, T)^ and 
D?. The generated partition is added to the set of 
partitions PR ^ PR U {(e,-,D?,D|^,Df)) and set 
!P (D) <- f- (D) \ Df. If r(D) *(D then goto Step 
2. 

In Example[T|the set of diagnoses D of the ontology O 
contains 4 elements. Therefore, the power set !P(D) in- 
cludes 16 elements {{Di},{D2} , ... ,{Di,D2,D3,D4}}. 
However, we can omit the element corresponding to 
as it does not contains any diagnoses to be evaluated. 
Moreover, assume that P and A^ are empty. On each iter- 
ation an element of !P(D) is assigned to the set DP. For 



instance, the algorithm assigned Dj = {!Di,D2}- In this 
case the set of common entailments is empty asO\ Di 
has no entailed instances (in addition to the given class 
assertions, see Table [T}. Therefore, the set {Di,D2} is 
rejected and removed from !P(D). Assume that on the 
next iteration the algorithm selected D^ = {£)2,£)3). In 
this case the set of common entailments £2 = {Biw)} 
is not empty and so Q2 - {B(w)}. The remaining di- 
agnoses £>! and D4 are classified according to Defini- 
tion l6] That is, the algorithm selects the first diagnosis 
Di and verifies whether {0\Di) |= {B{w)}. Given the 
negative answer of the reasoner, the algorithm checks if 
(<9 \ Di) U {B{w)} is inconsistent. Since the condition is 
satisfied the diagnosis D\ is added to the set D^. The 
second diagnosis D4 is added to the set D^ as it satisfies 
the first requirement (O \ D4) \= {B(w)}. The resulting 
partition {{B(w)},{D2,D3,D4},{Di},(/i} is added to the 
set PR. 

However, a query need not include all of the entailed 
sentences. If a query Q partitions the set of diagnoses 
into D^, D"^ and D" and there exists an (irreducible) sub- 
set Q' c Q which preserves the partition then it is suf- 
ficient to query Q'. In our example, Q2 : {B{w),C(w)} 
can be reduced to its subset Q'2 : {C(w)). If there are 
multiple irreducible subsets that preserve the partition 
then we select one of them. 

All queries and corresponding partitions generated in 
Examplefllare presented in Tablel2] Given these queries 
the debugger has to decide which one should be asked 
first in order to minimize the number of queries to be 
answered. A popular query selection heuristic (called 
"Split-in-half") prefers those queries, which allow to re- 
move a half of the diagnoses from the set D, regardless 
of the answer of an oracle. 

Using the data presented in Table |2j the "Split-in- 
half" heuristic determines that asking the oracle if O, |= 
{C(w)) is the best query (i.e. the reduced query Q2), 
as two diagnoses from the set D are removed regard- 
less of the answer. Let us assume that Di is the tar- 
get diagnosis, then an oracle will answer no to our 
question (i.e. O, ^ {C{w)}). Based on this feedback, 
the diagnoses Dt, and D4 are removed according to 
Property l2] Given the updated set of diagnoses D and 
P - {C(w)} the partitioning algorithm returns the only 
partition {{B(w)] , {D2} , {Di} , 0). Therefore we ask the 
query {B(w)}, which is also answered with no by the 
oracle. Consequently, we identified D] as the only re- 
maining minimal diagnosis. 

In general, if n is the number of diagnoses and we can 
split the set of diagnoses in half by each query, then the 
minimum number of queries is log2n. However, if the 
probabilities of diagnoses are known we can reduce this 



number of queries by using two effects: 

1. We can exploit diagnoses probabilities to asses the 
probabilities of answers and the expected value of 
information contained in the set of diagnoses after 
an answer is given. 

2. Even if there are multiple diagnoses in the set of re- 
maining diagnoses we can stop further query gen- 
eration if one diagnosis is highly probable and all 
other remairung diagnoses are highly improbable. 

Example 2. Consider an ontology O with the terminol- 
ogy T: 

axi : Ai C A2 n Ml n M2 0x4 : M2 C Ss.A n D 

ax2 : A2 C -3S.M3 n 3s.M2 ax^ : M^ = B U C 

ax3 : Ml C -lA n B 

and the background theory containing the assertions 
Ji: {Ai(w),Ai(u),s(u,w)}. 

The ontology is inconsistent and includes two min- 
imal conflict sets: {{axi,ax2,ax4} , {axi,ax2, 0x3, ax^}}. 
To restore consistency, the user should modify all ax- 
ioms of at least one minimal diagnosis: 



Di : [axi] 
D2 : [axj] 



D2 : [ax4,ax5] 
D4 : [ax4,ax2] 



Following the same approach as in the first example, 
we compute a set of possible queries and corresponding 
partitions using the algorithm presented above. A set of 
irreducible queries possible in Example [2] and their par- 
titions are presented in Table l3] These queries partition 
the set of diagnoses D in a way that makes the appli- 
cation of myopic strategies, such as "Split-in-half", in- 
efficient. A greedy algorithm based on such a heuristic 
would select the first query Qi as the next query, since 
there is no query that cuts the set of diagnoses in half. 
If D4 is the target diagnosis then Qi will be positively 
evaluated by an oracle (see Figure fTJ. On the next it- 
eration the algorithm would also choose a suboptimal 
query since there is no partition that divides the diag- 
noses £)i, ©2, and D4 into two equal groups. Conse- 
quently, it selects the first untried query Q2. The oracle 
answers positively, and the algorithm identifies query 
Q4 to differentiate between Di and D4. 

However, in real-world settings the assumption that 
all axioms fail with the same probability is rarely the 
case. For example, Roussey et al. |5| present a list of 
"anti-patterns". Each anti-pattern is a set of axioms, 
like {CI C V7;.C2,C1 C V/?.C3,C2 = ^C3), that cor- 
respond to a minimal conffict set. The study performed 
by the authors shows that such conffict sets occur often 
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in practice and therefore can be used to compute proba- 
bilities of diagnoses. 

The approach that we follow in this paper was sug- 
gested by Rector et al. |4| and considers the syntax of 
a knowledge representation language, such as restric- 
tions, conjunction, negation, etc., rather than axioms to 
describe a failure pattern. For instance, if a user fre- 
quently modifies the universal to the existential quanti- 
fier and vice versa in order to restore coherency, then 
we can assume that axioms including restrictions are 
more probable to fail than the other ones. In |4| the 
authors report that in most cases inconsistent ontologies 
were created because users (a) mix up "ir.S and 3r.S , 
(b) mix up -3r.S and 3r.-i5, (c) mix up U and n, (d) 
wrongly assume that classes are disjoint by default or 
overuse disjointness, (e) wrongly apply negation. Ob- 
serving that misuses of quantifiers are more likely than 
other failure patterns one might find that the axioms ax2 
and ax4 are more likely to be faulty than ax^ (because of 
the use of quantifiers), whereas 0x3 is more likely to be 
faulty than ax^ and axi (because of the use of negation). 

Detailed justifications of diagnoses probabilities are 
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Figure 1 : The search tree of the greedy algorithm 



given in the next section. However, let us assume some 
probability distribution of the faults according to the ob- 
servations presented above such that; (a) the diagnosis 
£)2 is the most probable one, i.e. single fault diagnosis 
of an axiom containing a negation; (b) although T)^ is 
a double fault diagnosis, it follows D2 closely as its ax- 
ioms contain quantifiers; (c) £)i and ©3 are significantly 
less probable than £)] because conjunction/disjunction 
in ax\ and ax=, have a significantly lower fault probabil- 
ity than negation in 0x3 . Taking into account this infor- 
mation it is almost useless to ask query Q\ because it 
is highly probable that the target diagnosis is either D2 
or ©4 and, therefore, it is highly probable that the or- 
acle will respond with yes. Instead, asking Q^ is more 
informative because given any possible answer we can 
exclude one of the highly probable diagnoses, i.e. either 
D2 or 2D4. If the oracle responds to Q^ with no then D2 
is the only remaining diagnosis. However, if the oracle 
responds with yes, diagnoses D4, D3, and Di remain, 
where ©4 is significantly more probable compared to 
diagnoses ©3 and ©1 . We can stop, if the difference be- 
tween the probabilities of the diagnoses is high enough 
such that "Di, can be accepted as the target diagnosis. 
Otherwise, additional questions may be required. This 
strategy can lead to a substantial reduction in the num- 
ber of queries compared to myopic approaches as we 
will show in our evaluation. 

Note that in real-world application scenarios failure 
patterns and their probabilities can be discovered by 
analyzing actions of a user in an ontology editor, like 
Protege, while debugging an ontology. In this case it 
is possible to "personalize" the measurement selection 
algorithm such that it will prefer user-specific faults. 
However, as our evaluation shows only a rough estimate 



of the probabilities is sufficient to outperform the "SpHt- 
in-half" heuristic. 



3. Entropy-based query selection 

To select the best query we make the assumption that 
knowledge is available about the a-priori failure proba- 
bilities. In our approach we follow the proposal of Rec- 
tor et al. pll and describe failure patterns employing the 
syntax of description logics or some other knowledge 
representation language, such as OWL. That is, either a 
user should express own beliefs in terms of the probabil- 
ity of a syntax element like V, 3, U, etc. to be erroneous; 
or the debugger can compute these probabilities by an- 
alyzing how often a particular syntax element occurred 
in target diagnoses of different debugging sessions. If 
no information about failures is available then the de- 
bugger can initialize all probabilities with some small 
number. 

Given failure probabilities of all syntax elements of a 
knowledge representation language we can compute the 
failure probability of an axiom 

piaxj) — p(sei n se2 Pi ■ ■ ■ Pi se„) 

where sei ... se„ are syntax elements occurring in ax,. 
Assuming that all syntax elements fail independently, 
i.e. an erroneous usage of a syntax element se, makes it 
neither more nor less probable that a syntax element sej 
is faulty, the failure probability of an axiom is defined 
as: 



p(axi) = 2] M*^') - 2j P^^^i^P^^^'^i^ + ■ • ■ 

l<i<n \<i<i<n 

+(-1)""' n ^^'^'^ 



(1) 



For instance, the axiom ax2 in Example l2] includes 
the following syntax elements {C, ->, 3, U, 3). If among 
other failure probabilities the user provides that /?(C) - 
0.001, ;?(^) = 0.01, /9(3) = 0.05 and;?(U) = 0.001 then 
p{ax2) = 0.108. 

Given the failure probabilities piax,) of axioms, the 
diagnosis algorithm first calculates the a-priori proba- 
bility p{Dj) that Oj is the target diagnosis. Since all 
axioms fail independently, this probability can be com- 
puted as ifTSl : 

p{Dj) = 11 p{axn) Y\ 1 ~ Piax,„) (2) 

ax„ eDj ax„ iDj 

The prior probabilities for diagnoses are then used to 
initiahze an iterative algorithm that includes two main 



steps: (a) selection of the best query and (b) update of 
the diagnoses probabilities given the query feedback. 

According to information theory the best query is 
the one that, given the answer of an oracle, minimizes 
the expected entropy of the set of diagnoses 1 15|. Let 
piQi - y^s) be the probability that query Q, is answered 
with yes and p{Qi - no) be the probability for the an- 
swer no. Let p{Dj\Qi - yes) be the probability of diag- 
nosis Oj after the oracle answers yes and p(Dj\Qi - no) 
be the probability for the answer no. The expected en- 
tropy after querying Qi is: 



[yes^no] 

Yj P^^j\Qi = v)\Qg^ p{Dj\Qi = v) 



v&{yes,no] 



0,eD 

The query which minimizes the expected entropy is 
the best one based on a one-step-look-ahead informa- 
tion theoretic measure. This formula can be simplified 
to the following score function |15| which we use to 
evaluate all available queries and select the one with the 
minimum score to maximize information gain: 



sciQi)^ Y [piQi^v)\Qg2p{Qi^v)] 



ve{yes,no] 



(3) 



+ p(D?)+l 



where D? is the set of diagnoses which do not make any 
predictions for the query 2,. /?(D?) is the total probabil- 
ity of the diagnoses that predict no value for the query 
Qi. Since, for a query Qi the set of diagnoses D can be 
partitioned into the sets D?, D?* and D?, the probability 
that an oracle will answer a query Qi with either yes or 
no can be computed as: 



p{Qi^yes)^p{V)^^) + p{Ml)l2 
p{Qi^no)^p{Y)^) + p(y)\)l2 



(4) 



Under the assumption that for each diagnosis of D? 
both outcomes are equally likely the probability that the 
set of diagnoses D? predicts either Q, - yes or g, = no 
is p(Df )/2. 

Because of Definition [T] each diagnosis is a unique 
partition of all axioms of an ontology O into correct 
and faulty, all diagnoses are mutually exclusive events. 
Therefore the probabilities of their sets can be calcu- 
lated as: 

p(Si) = Yj P^'^^ 

0,eSi 

where Si corresponds to the sets DP, D?* and D? respec- 
tively. 



Given the feedback v of an oracle to the selected 
query Qs, i.e. Qs - v, we have to update the probabil- 
ities of the diagnoses to take the new information into 
account. The update is made using Bayes' rule for each 



'Di 6 D: 



p{'Dj\Q, = v) = 



p{Q, ^ v\Dj)p{.Dj) 
p(Qs = v) 



(5) 



where the denominator piQs - v) is known from the 
query selection step (Equation Hh and p{2)j) is either a 
prior probability (Equation [2]) or is a probability calcu- 
lated using Equation |5] after a previous iteration of the 
debugging algorithm. We assign p{Qs - v\Dj) as fol- 
lows: 



PiQs = v\Dj) 



1, 

0, 

I 

{2' 



if Dj predicted Qs = v; 
if Dj is rejected by Qs = v; 
ifSyeD® 



Example 1 (continued) Suppose that the debugger is 
not provided with any information about possible fail- 
ures and therefore it is assumed that all syntax ele- 
ments fail with the same probability 0.01 and there- 
fore piaxj) - 0.01. Using Equation [2] we can calcu- 
late probabilities for each diagnosis. For instance, Di 
suggests that only one axiom axi should be modified by 
the user. Hence, we can calculate the probability of di- 
agnosis Di as follows p{D\) - p{ax\){\ - p{ax2)){\ - 
/?(flX3))(l - p{axn)) - 0.0097. All other minimal di- 
agnoses have the same probability, since every other 
minimal diagnosis suggests the modification of one ax- 
iom. To simplify the discussion we only consider min- 
imal diagnoses for the query selection. Therefore, the 
prior probabilities of the diagnoses can be normalized 
to p(Dj) = p{£>j)/ Zv^eD Pi^j) and are equal to 0.25. 

Given the prior probabilities of the diagnoses and a 
set of queries (see Table l2]) we evaluate the score func- 
tion (Equation[3]l for each query. E.g. for the first query 
gi : {B{w)} the probability /^(D®) = and the proba- 
bilities of both the positive and negative outcomes are: 
piQi - 1) = piOi) + piOi) + p(D,) = 0.75 and 
piQ\ = 0) - piDi) - 0.25. Therefore the query score 
is .?c(ei) = 0.1887. 

The scores computed during the initial stage (see Ta- 
ble |4| suggest that Q2 is the best query. Note, we in- 
clude in Tablefflthe minimized queries. Taking into ac- 
count that D] is the target diagnosis the oracle answers 
no to the query. The additional information obtained 
from the answer is then used to update the probabilities 
of diagnoses using the Equation |5] Since Di and D2 
predicted this answer, their probabilities are updated. 



Query 
Qi ■■ {B(w)] 
Qi ■■ {C(w)} 
G3 : [QM] 



Initial score 
0.1887 


0.1887 



62 = yes 

1 

1 



Table 4: Expected scores 

Query 


for queries (p(axi) - 

Initial score 


= 0.01) 


fii : {B{w)} 
Qi ■■ {Ciw)} 
Q3 ■■ {Qiw)] 


0.250 

0.408 
0.629 





Table 5: Expected scores for queries (p{ax\) = 0.025, p(ax2) 
p(ax3) = p(ax4) = 0.01) 



piDi) = piD2) = llpiQ2 = 1) = 0.5. The proba- 
bilities of diagnoses 2D3 and £)4 which are rejected by 
the outcome are also updated, piD^) = piO/i) - 0. 

On the next iteration the algorithm recomputes the 
scores using the updated probabilities. The results show 
that Q\ is the best query. The other two queries Q2 and 
Qt, are irrelevant since no information will be gained if 
they are performed. Given the negative feedback of an 
oracle to Q\, we update the probabilities piD\) - 1 and 
piX>2) - 0. In this case the target diagnosis Di was 
identified using the same number of steps as the split- 
in-half heuristic. 

However, if the user specifies directly that the first ax- 
iom is more likely to fail, e.g. piaxi) - 0.025, then the 
first query will be Q\ : {B(w)) (see Table|5]l. The recal- 
culation of the probabilities given the negative outcome 
Qi=0 sets piDi) = 1 and /^(©z) = pi03) = piD^) = 
0. Therefore the debugger identifies the target diagnosis 
only in one step. 

Example 2 (continued) Suppose that in 0x4 the user 
specified Vi.A instead of 3s.A and -i3s.M3 instead of 
3s.^Mt, in ax2. Therefore D4 is the target diagnosis. 
Moreover, the debugger is provided with observations 
of three types of faults: (1) conjunction/disjunction oc- 
curs with probability pi = 0.001, (2) negation p2 = 
0.01, and (3) restrictions p^ = 0.05. Using Equa- 
tion [1] we can calculate the probability of the axioms 
containing an error: piax\) - 0.0019, piax2) - 0.1074, 
piaxi) = 0.012, p{ax4) = 0.051, and piaxs) = 0.001. 
These probabilities are exploited to calculate the prior 
probabilities of the diagnoses (see Table l6]l and to ini- 
tialize the query selection process. To simplify matters 
we focus on the set of minimal diagnoses. 

On the first iteration the algorithm determines that 
Qs is the best query and asks an oracle whether Oi \= 
{Ml C B] is true or not (see Table [7]i. The obtained in- 



formation is then used to recalculate the probabilities of 
the diagnoses and to compute the next best query, i.e. 
Q4, and so on. The query process stops after the third 
query, since D4 is the only diagnosis that has the prob- 
ability p(D4) > 0. 

Given the feedback of the oracle Q4 = yes for the 
second query, the updated probabilities of the diag- 
noses show that the target diagnosis has a probability 
of p(D4) = 0.9918 whereas p(D3) is only 0.0082. In 
order to reduce the number of queries a user can specify 
a threshold, e.g. cr - 0.95. If the absolute difference in 
probabilities of two most probable diagnoses is greater 
than this threshold, the query process stops and returns 
the most probable diagnosis. Therefore, in this exam- 
ple the debugger based on the entropy query selection 
requires less queries than the "Split-in-half" heuristic. 
Note that aheady after the first answer Q^ - yes the 
most probable diagnosis D4 is three times more likely 
than the second most probable diagnosis Di. Given 
such a great difference we could suggest to stop the 
query process after the first answer by setting cr = 0.65. 

4. Implementation details 

The iterative ontology debugger (Algorithm [Til takes 
a faulty ontology O as input. Optionally, a user can 
provide a set of axioms S that are known to be cor- 
rect as well as a set P of axioms that must be entailed 
by the target ontology and a set A^ of axioms that must 
not. If these sets are not given, the corresponding input 
arguments are initialized with 0. Moreover, the algo- 
rithm takes a set FP of fault probabilities for axioms 
axi e O, which can be computed as described in Sec- 
tion [3] by exploiting knowledge about typical user er- 
rors. The two other arguments cr and n are used to 
speed up the performance of the algorithm, cr sets the 
diagnosis acceptance threshold that defines the absolute 
difference in probabilities of the two most probable di- 
agnoses. The parameter n defines a maximum number 
of most probable diagnoses that should be considered 
by the algorithm on each iteration. A further perfor- 
mance gain in Algorithm [T] can be achieved if we ap- 
proximate the set of the n most probable diagnoses with 
the set of the n most probable minimal diagnoses, i.e. 
we neglect non-minimal diagnoses. We call this set of 
at most n most probable minimal diagnoses the lead- 
ing diagnoses. Note, under a reasonable assumption 
that the fault probability of each axiom piaxi) is less 
than 0.5, it is the case that for every non-minimal diag- 
nosis ND a minimal diagnosis D c ND exists, which 
from Equation |2] is more probable than ND. Conse- 
quently the query selection algorithm operates on the set 



of minimal diagnoses instead of all diagnoses (includ- 
ing non-minimal ones). However, the algorithm can be 
adapted with moderate effort to consider non-minimal 
diagnoses. 

We implemented the computation of diagnoses fol- 
lowing the approach proposed by Friedrich et al. fSj. 
The authors employ the combination of two algorithms, 
QuicicXplain Ifm and HS-Tree ||T41 . In a standard im- 
plementation the latter is a breadth-first search algo- 
rithm that takes an ontology O, sets of logical sentences 
P and A^, and the maximal number of most probable 
minimal diagnoses n as an input. In particular, minimal 
hitting set generation and the search for minimal conflict 
sets is interleaved. This is motivated by the fact that for 
the generation of a subset of the set of all minimal di- 
agnoses possibly only a subset of the set of all minimal 
conflict sets is needed. In our case we compute at most 
n minimal diagnoses. This is an important property be- 
cause the number of minimal conflict sets can grow ex- 
ponential in the size of the ontology. Note, a minimal 
diagnosis is a minimal hitting set of all minimal conflict 
sets. However, in order to verify that a set of axioms is a 
minimal diagnosis, the set of all minimal conflict sets is 
not needed. In our implementation of HS-Tree we use 
the uniform-cost search strategy. Given additional in- 
formation in terms of fault axiom probabilities FP, the 
algorithm expands a leaf node in a search-tree if it is an 
element of the maximum probability hitting set, given 
the currently found set of minimal conflict sets. The 
probability of each hitting set can computed using Equa- 
tion [2] Consequently, the algorithm computes a set of 
diagnoses ordered by their probability starting from the 
most probable one. HS-Tree terminates if either the n 
most probable minimal diagnoses are identified or there 
are no further minimal diagnoses. 

The search algorithm computes minimal conflicts us- 
ing QuickXplain. This algorithm, given a set of axioms 
AX and a set of correct axioms S returns a minimal 
conflict set CS c AX, or if axioms AX U S are con- 
sistent. Minimal conflicts are computed on-demand by 
HS-Tree while exploring the search space. 

In order to take past answers into account the HS- 
Tree updates the prior probabilities of the diagnoses by 
evaluating Equation l5] The query history is stored in 
QH as well as in the updates of P, and A^. As a result 
HS-Tree returns a set of tuples (©,, p{Di)} where £), is 
contained in the set of the n most probable minimal di- 
agnoses (leading diagnoses) and /?(£),) is its probability 
using Equation island Equation l5] 

In the query-selection phase Algorithm [T] calls se- 
lectQuery function (Algorithmic]) to generate a tuple 
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Table 7: Expected scores for queries 



Algorithm 1: ontoDebugging((9, S, P, N, FP, n, cr) 
Input: ontology O, set of background axioms S, 
P, N sets of sentences to be (not) entailed, 
set of fault probabilities for axioms FP, 
maximum number of most probable 
minimal diagnoses n, 
acceptance threshold cr 
Output: a diagnosis D 

1 DP ^%;QH ^%;T ^ (0, 0, 0, 0); 

2 while belowThreshold(DP, cr) A getScoreCT) + 1 
do 

3 DP <^ HS-Tree(C), SUPN, FP, QH, n); 

4 T «- selectQuery(DP, O, S, P); 

5 2 *~ GETQuERY(r); 

6 if 2 = then exit loop; 

7 if getAnswer(C), 1= 2) then P ^ PUQ; 
else N <^NUQ; 
QH^QHU{T}; 

10 return mostProbableDiagnosis(DP); 



r = /2>D^'D'^»D''y where 2 corresponds to the min- 
imal score query (Equation [3]) for the sets of diagnoses 
D'',D'^ and D". The generation algorithm implements 
a depth-first search as it removes the top element of the 
set DP and calls itself recursively to generate all possi- 
ble subsets of the leading diagnoses. In each leaf node 
of the search tree the generate function calls create- 
QuERY to create a query given a set of diagnoses D^ as 



described in Section 2.2 i.e. computation of common 
entailments followed by a partitioning of the diagnoses. 
If a query for the set D^ does not exists or D^ = then 



createQuery returns an empty tuple T - (0,0,0,0). In 
all other nodes of the tree the algorithm selects a tu- 
ple that corresponds to a query with the minimal score 
by using the getScore function. The latter might imple- 
ment the entropy-based measure (EquationlSll, "Split-in- 
half" or any other preference criteria. Given an empty 
tuple T - (0, 0, 0, 0) the function should return the high- 
est possible score of 1. Moreover, if the scores are 
equal then the algorithm returns a tuple where Q has the 
smallest cardinality in order to reduce the answering ef- 
fort. By the function minimizeQuery the query Q of the 
resulting tuple / Q, D^, D'^, D®\ is iteratively reduced by 
applying QuickXplain such that sets D'', D^ and D® are 
preserved. However, minimizeQuery checks if the query 
was already minimzed. 

In Algorithm [Tithe function getQuery simply selects 
the query from the tuple stored in T and subsequently 
the user is asked by getAnswer. Depending on the an- 
swer of the oracle. Algorithm [T] extends either the set 
P or the set A^. This is done to exclude corresponding 
diagnoses from the results of HS-Tree in further itera- 
tions. Note, the algorithm can be easily adapted to allow 
the oracle to reject a query if the answer is unknown. In 
this case the algorithm proceeds with the next best query 
until no further queries are available. 

Algorithm [T] stops if there is a diagnosis probabiUty 
above the acceptance threshold cr or if no query can be 
used to differentiate between the remaining diagnoses 
(i.e. the score of the minimal score query is 1). The most 
probable diagnosis is then returned to the user If it is 
impossible to differentiate between a number of highly 
probable minimal diagnoses, the algorithm returns a set 
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Algorithm 2: selectQuery(DP, O, S, P) 
Input: DP if set of pairs (£),, p{Di)), O ontology, 
S set of background axioms, P set of 
axioms that must be entailed by the target 
ontology 
Output: a tuple (2,0^,0^,0") 

1 r «- generate(0,DP,<9,S,P); 

2 return MiNiMizEQuERY(r); 

3 function generate (D^, DP, (9, S, P) 
returns a tuple {q, DI", D^*, D") 

if £)f = then 
I return createQuery (D^, (9, S, P); 

<£»,p(D))^pop(DP); 

/e/f <- generate (D^, DP, O, S, P); 

right <- GENERATE (D"" U {{D, p{D)}} , 

DP,0,!B,Py, 
if getScore (left) < getScore (right) then 
I return left; 

else if getScore (left) > getScore (right) then 
return right; 

12 Ze/r <— MINIMIZEQuERY(fe/T'); 

13 right <— MINlMIZEQuERY(r/g/zf); 

14 return minCardinalityQuery (fe/f, right); 



that includes all of them. Moreover, in the first case 
(termination on cr), the algorithm can continue, if the 
user is not satisfied with the returned diagnosis and at 
least one query exists. 

Additional performance improvements can be 
achieved by using greedy strategies in Algorithm l2] 
The idea is to guide the search in a way that a leaf node 
of the left-most branch of a search tree will contain 
such a set of diagnoses D'' that might result in a tuple 
(Q, D'',D'^,D''> with a low-score query. This method is 
based on the property of Equation l3] that sc(Q) = if 

^ p(Dd = Yj P^^j'^ = 0-5 and /'(D") = 



O.gD'' 



O.eDN 



Consequently, the query selection problem can be pre- 
sented as a two-way number partitioning problem: 
Given a set of numbers, divide them into two sets such 
that the difference between the sums of the numbers 
in each set is as small as possible. The Complete 
Karmarkar-Karp (CKK) algorithm ifTSl . which is one 
of the best algorithms developed for the two-way par- 
titioning problem, corresponds to an extension of the 
Algorithm [2] with a set diff'erencing heuristic |fT9l . The 



algorithm stops if either the optimal solution to the two- 
way partitioning problem is found or there are no further 
subsets to be investigated. 

The main drawback of CKK applied to the query se- 
lection is that none of the pruning techniques can be 
used, since we cannot guarantee that a query can always 
be generated for a given set of diagnoses D''. Even if the 
algorithm finds an optimal solution to the two-way par- 
titioning problem, it still has to investigate all subsets of 
the set of diagnoses in order to find the minimum score 
query. To avoid this exhaustive search we extended 
CKK with one more termination criteria. Namely, the 
search stops if a query is found with a score below some 
predefined threshold y. 

5. Evaluation 

The evaluation of our approach was performed us- 
ing generated examples and real-world ontologies pre- 
sented in Table [8] We employed generated examples 
to perform controlled experiments where the number of 
minimal diagnoses and their cardinality could be varied 
to make the identification of the target diagnosis more 
difficult. The main goal of the experiment using real- 
world ontologies is to demonstrate the applicability of 
our approach in real-world settings. 

For the first test we created a generator which takes a 
consistent and coherent ontology, a set of fault patterns 
together with their probabilities, the minimum number 
of minimum cardinality diagnoses m, and the required 
cardinaUty |D,| of these minimum cardinality diagnoses 
as inputs. For the tests we assume that the target di- 
agnosis has cardinality |£),|. The output of the gener- 
ator is an alteration of the input ontology for which at 
least the given number of minimum cardinality diag- 
noses with the required cardinality exist. In order to 
introduce inconsistencies and incoherences, the genera- 
tor applies fault patterns randomly to the input ontology 
depending on their probabilities. 

In this experiment we took five fault patterns from 
a case study reported by Rector et al. f4l and assigned 
fault probabilities according to their observations of typ- 
ical user errors. Thus we assumed that in the cases 
(a) and (b) (see Section |2!2] i, when an axiom includes 
some roles (i.e. property assertions), axiom descriptions 
are faulty with a probability of 0.025, in the cases (c) 
and (d) 0.01 and in the case (e) 0.001. In each itera- 
tion the generator randomly selected an axiom to be al- 
tered and applied a fault pattern to this axiom. Next, 
another axiom was selected using the concept taxon- 
omy and altered correspondingly to introduce an in- 
coherency/inconsistency. The fault patterns were ran- 



12 





Ontology 


Axioms 


#C/#P/#I 


#CS/min/max 


#D/min/max 


Domain 


1. 


Chemical 


114 


48/20/0 


6/5/6 


6/1/3 


Chemical elements 


2. 


Koala 


44 


21/5/6 


3/4/4 


10/1/3 


Training 


3. 


Sweet-JPL 


2579 


1537/121/50 


8/1/13 


13/8/8 


Earthscience 


4. 


miniTambis 


173 


183/44/0 


3/3/6 


48/3/3 


Biological science 


5. 


University 


50 


30/12/4 


4/3/5 


90/3/4 


Training 


6. 


Economy 


1781 


339/53/482 


8/3/4 


864/4/9 


Mid-level 


7. 


Transportation 


1300 


445/93/183 


9/2/6 


1782/6/9 


Mid-level 



Table 8: Dianosis results for some real-world ontologies presented in |7|. #C/#P/#I are the numbers of concepts, properties, and individuals in 
an ontology. #CS/min/max are the number of conflict sets, their minimum and maximum cardinality. The same notation is used for diagnoses 
#D/min/max. These ontologies are available upon request. 



domly selected in each step using the probabilities pro- 
vided above. 

For instance, given the description of a randomly se- 
lected concept A and the fault pattern "misuse of nega- 
tion", we added the construct n-iX to the description of 
A, where X is a new concept name. Next, we randomly 
selected concepts B and S such that S C A and S C B 
and added nX to the description of B. During the gen- 
eration process, we applied the HS-Tree algorithm af- 
ter each introduction of an incoherency/inconsistency to 
control two parameters: the minimum number of mini- 
mal cardinality diagnoses in the ontology and their car- 
dinality. The generator continues to introduce incoher- 
ences/inconsistencies until the specified parameter val- 
ues are reached. For instance, if the minimum number 
of minimum cardinality diagnoses is equal to m = 6 and 
their cardinality is |D,| = 4, then the generated ontology 
will include at least 6 diagnoses of cardinality 4 and pos- 
sibly some additional number of minimal diagnoses of 
higher cardinalities. 

The resulting faulty ontology as well as the fault pat- 
terns and their probabilities were inputs for the ontology 
debugger. The acceptance threshold cr was set to 0.95 
and the number of most probable minimal diagnoses n 
was set to 9. One of the minimal diagnoses with the 
required cardinality was randomly selected as the target 
diagnosis. Note, the target ontology is not equal to the 
original ontology, but rather is a corrected version of the 
altered one, in which the faulty axioms were repaired by 
replacing them with their original (correct) versions ac- 
cording to the target diagnosis. The tests were done on 
ontologies bike2 to bike9, bcs3, galen and galen2 from 
Racer's benchmark suitq^ 

The average results of the evaluation performed on 
each test suite (presented in Figure [2]) show that the 
entropy-based approach outperforms the "Split-in-half" 



^Available at |http://w«w. racer-systems. com/products/| 
Idownload/benchmark . phtmll 



heuristic as well as the random query selection strategy 
by more than 50% for the |£),| - 2 case due to its abil- 
ity to estimate the probabilities of diagnoses and to stop 
when the target diagnosis crossed the acceptance thresh- 
old. On average the algorithm required 8 seconds to 
generate a query. Figure l2] also shows that the number 
of required queries increases as the cardinality of the tar- 
get diagnosis increases. This holds for the random and 
"Split-in-half" methods (not depicted) as well. How- 
ever, the entropy-based approach is still better than the 
"Split-in-half" method even for diagnoses with increas- 
ing cardinality. The approach required more queries 
to discriminate between high cardinality diagnoses be- 
cause the prior probabilities of these diagnoses tend to 
converge. 

In the tests performed on the real-world ontologies 
we evaluated the performance of the entropy-based de- 
bugging algorithm given different user estimations of 
prior fault probabilities. The priors are very impor- 
tant since they are used by the entropy-based method 
to identify the best query to be asked. Given some 
misleading priors the entropy-based algorithm might re- 
quire more queries to identify the target diagnosis. In 
our experiment we differentiated between three differ- 
ent distributions of the prior fault probabilities: extreme, 
moderate and uniform (see Figure [3] for an example). 
The extreme distribution simulates a situation when a 
user assigns very hight failure probabilities to a small 
number of syntax elements. That is, the user is quite 
sure that exactly these elements are causing a fault. For 
instance, the user has problems with formulating restric- 
tions in OWL whereas all other elements, like subsump- 
tion, conjunction, etc., used in a faulty ontology are 
well understood. In the case of a moderate distribution 
the user provides a slight bias towards some syntax ele- 
ments. This distribution has the same motivation as the 
extreme, however, in this case the user is less sure about 
possible causes of the problem. Both extreme and mod- 
erate distributions correspond to the exponential distri- 
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Figure 2: Average number of queries required to select the target diagnosis D, with threshold cr = 0.95. Random and "Split-in-half" are shown for 
the cardinality of minimal diagnoses \Di\ = 2. 



bution with A = 1 .75 and A - 0.5 respectively. The uni- 
form distribution models the situation when the user did 
not provide any prior fault probabilities and the system 
assigns equal probabilities to all syntax elements found 
in a faulty ontology. Of course the user can make a mis- 
take while estimating the priors and provide higher fault 
probabilities to elements that are correct. Therefore, for 
each of the three distributions we differentiate between 
good, average and bad cases. In the good case the user's 
estimates of the prior fault probabilities are correct and 
the target diagnosis receives a high probability. The av- 
erage case corresponds to the situation when the target 
diagnosis is neither favored nor penalized by the priors. 
In the bad case the prior distribution predicts the target 
diagnosis incorrectly and, consequently, its probability 
is quite low. 

We executed 30 tests for each of the combinations 
of the distributions and cases with acceptance threshold 
cr = 0.85 and number of most probable minimal diag- 
noses n = 9. Each iteration started with the generation 
of a set of prior fault probabilities of syntax elements by 
sampling from a selected distribution (extreme, moder- 
ate or uniform). Given the priors we computed the set 
of all minimal diagnoses D of a given ontology and se- 
lected the target one according to the chosen case (good, 
average or bad). In the good case the prior probabilities 
favor the target diagnosis and, therefore, it should be se- 
lected from the diagnoses with high probability. The set 
of diagnoses was ordered according to their probabil- 
ities and the algorithm iterated through the set starting 
from the most probable element. In each iteration j a di- 
agnosis Dj was added to the set G if Zi(<j pi^i) ^ 5 and 
to the set A if Yji<j Pi^i) ^ 5 ■ The obtained set G con- 
tained all most probable diagnoses which we considered 



as good. All diagnoses in the set A \ G were classified 
as average and the remaining diagnoses D \ A as bad. 
Depending on the selected case we randomly selected 
one of the diagnoses as the target from the appropriate 

set. 

The results of the evaluation presented in Table |9] 
show that the entropy-based query selection approach 
clearly outperforms "Split-in-half" in good and average 
cases for the three probability distributions. The plot of 
average number of queries required to identify the target 
diagnosis presented in Figure |4] shows that the perfor- 
mance of the entropy -based method does not depend on 
the type of the distribution provided by the user. In the 
uniform case the better results were observed since the 
diagnoses have different cardinality and structure, i.e. 
they include different syntax elements. Consequently, 
even if equal probabilities for all syntax elements (uni- 
form distribution) are given, the probabilities of diag- 
noses are different. These differences provided enough 
bias to the entropy-based method. Only in the case of 
Sweet- JPL ontology the bias was insufficient and some- 
times misleading since all diagnoses in this ontology are 
of the same cardinality and have similar structure. The 
major loss of performance can only be observed if the 
user provided misleading priors making the target di- 
agnosis improbable. Therefore, we can conclude that 
the user should provide only some rough estimates of 
the prior fault probabilities that, however, favor the tar- 
get diagnosis. The differences between probabilities of 
individual syntax elements are not influencing the re- 
sults of the query selection and effect only the number 
of outliers, i.e. the cases when the diagnosis approach 
required either few or many queries compared to the av- 
erage. 
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Figure 3 : Example of prior fault probabilities of syntax elements sam- 
pled from extreme, moderate and uniform distributions. 



Note that "Split-in-half" is inefficient in comparison 
to the entropy method in all good, average and bad cases 
when applied to the ontologies with a big number of 
diagnoses, such as Economy and Transportation. The 
main problem is that no stop criteria can be used with 
the greedy method as it does not provide any ordering 
on the set of diagnoses. Therefore, it has to continue 
until no further queries can be generated, i.e. only one 
minimal diagnosis exists or their are no discriminating 
queries. 

Another interesting observation is that often both 
methods eliminated a bigger than n number of diagnoses 
in one iteration. For instance, in the case of Trans- 
portation ontology both methods were able to remove 
hundreds of minimal diagnoses with a small number of 
queries. The main reason for this behavior are relations 
between the diagnoses. That is, addition of a query to 
either P or A^ allows the method to remove not only the 
diagnoses in sets D'' or D"^, but also some unobserved 
diagnoses, that were not in any of the sets of n lead- 
ing diagnoses computed by HS-Tree. Given the sets P 
and A' HS-Tree automatically invalidates all diagnoses, 
which do not fulfill the requirements (see Definition [Til. 

The extended CKK method presented in Section p] 
was evaluated in the same settings as the complete Al- 
gorithm [2] with acceptance threshold y - 0.1. The ob- 
tained results presented in Figure [5] show that the ex- 
tended CKK method improves the time of a debugging 
session by at least 50% while requiring on average 0.2 
queries more than Algorithmic] In some cases (mostly 
for the uniform distribution) the debugger using greedy 
search required even less queries than Algorithm |2] be- 
cause of the inherent uncertainty of the domain. 




;s<^^ ^"^ <'i' jT 

a) Extreme distribution 




„Split-in-half' 

Entropy ^« 



c) Uniform distribution 

Good f'^^^^ Average 
■ Good ...A.. Average 



■ Bad 
-Bad 



Figure 4: Average number of queries required to identify the target 
diagnosis. 



6. Related work 

To the best of our knowledge no sequential ontology 
debugging methods (neither employing "Split-in-half 
nor entropy-based methods) have been proposed to de- 
bug faulty ontologies so far Diagnosis methods for 
ontologies are introduced in [6. 7_ -§J ■ Ranking of di- 
agnoses and proposing a target diagnosis is presented 
in |[101. This method uses a number of measures such 
as: (a) the frequency with which an axiom appears in 
conflict sets, (b) impact on an ontology in terms of its 
"lost" entailments when some axiom is modified or re- 
moved, (c) ranking of test cases, (d) provenance infor- 
mation about the axiom, and (e) syntactic relevance. 
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Entropy-based query selection 










Ontology 


Case 


min 






Distribution 










Extreme 

avg 


max 


Moderate 
min avg 


max 


min 


Uniform 

avg 


max 


Chemical 


Good 

Avg. 
Bad 


2 


1.77 
1.93 
2.93 


3 
3 
4 


1 2.03 

1 1.97 

2 3.07 


3 
3 
4 


1 
1 

2 


1.8 
1.9 

3.23 


3 
3 
4 


Koala 


Good 

Avg. 
Bad 


2 


1.83 

2.1 

4.6 


3 
4 
8 


1 2.5 

1 2.4 

2 4.1 


4 
4 
6 


2 
2 
3 


2.6 

2.73 
3.73 


3 
3 
5 


Sweet-JPL 


Good 

Avg. 
Bad 


3 


3.27 
3.4 
4.9 


6 
6 

7 


1 3.4 
1 3.57 
3 4.03 


5 
5 
7 


3 
3 
3 


3.37 
4.03 
4.3 


4 
5 
6 


minlTambis 


Good 

Avg. 
Bad 


3 


2.53 
2.7 
5.27 


4 
4 
9 


2 3.03 

2 3.17 

3 4.93 


4 
4 
8 


2 
3 
3 


2.9 

3.4 
4.9 


3 
4 
7 


University 


Good 

Avg. 
Bad 


2 
1 
4 


2.9 

3.33 
7.13 


4 
7 

22 


2 2.83 

3 3.8 
3 7.1 


3 
5 
13 


3 
3 
4 


3 
3.47 
6.87 


3 
5 
10 


Economy 


Good 
Avg. 
Bad 


2 
2 
8 


2.8 
3.1 
13.1 


3 
4 
20 


2 2.96 

3 3.2 

4 12.7 


3 
4 
21 


3 
3 
8 


3.1 
4.03 
15.5 


5 
5 
20 


Transportation 


Good 
Avg. 
Bad 


3 
3 
9 


3.9 

4.2 
15.8 


6 
6 

31 


3 4.76 
3 5.83 
10 15.6 


8 
9 
20 


3 
3 
8 


5.86 

6.8 

16.5 


8 
9 
30 









"Split-in- 


lialf" query selection 












Good 


2 


2.67 


3 


2 


2.67 


3 


2 


2.77 


3 


Chemical 


Avg. 


2 


2.63 


3 


2 


2.67 


3 


2 


2.83 


3 




Bad 


2 


2.63 


3 


2 


2.67 


3 


2 


2.43 


3 




Good 


3 


3.63 


4 


2 


3.67 


4 


3 


3.6 


4 


Koala 


Avg. 


2 


3.47 


4 


2 


3.57 


4 


3 


3.5 


4 




Bad 


3 


3.2 


4 


3 


3.23 


4 


3 


3.27 


4 




Good 


3 


4.17 


5 


3 


4 


5 


4 


4.33 


5 


Sweet-JPL 


Avg. 


3 


3.77 


5 


3 


3.77 


5 


3 


3.57 


4 




Bad 


3 


4.13 


5 


3 


3.7 


5 


3 


3.9 


5 




Good 


5 


5.73 


7 


5 


5.57 


7 


5 


5.53 


6 


miniTambis 


Avg. 


5 


5.47 


7 


5 


5.7 


7 


5 


5.4 


6 




Bad 


4 


5.67 


7 


5 


5.67 


7 


4 


5.8 


7 




Good 


5 


6.23 


8 


5 


6.13 


8 


5 


6.17 


8 


University 


Avg. 


5 


6.33 


8 


4 


6.3 


10 


5 


6.27 


8 




Bad 


5 


7.13 


10 


5 


6.67 


9 


5 


6.67 


8 




Good 


5 


10.87 


28 


5 


12.63 


42 


7 


13.6 


19 


Economy 


Avg. 


6 


14.85 


30 


7 


13.47 


26 


7 


14.8 


27 




Bad 


7 


16.1 


39 


9 


16.42 


36 


9 


17.47 


33 




Good 


5 


10.87 


32 


5 


13.53 


26 


5 


14.5 


26 


Transportation 


Avg. 


5 


13.57 


27 


5 


14.07 


26 


5 


15.4 


22 




Bad 


8 


16.5 


32 


11 


17.67 


32 


6 


18.6 


31 



Table 9: Minimum, average and maximum number of queries required by the entropy-based and "Split-in-half" query selection methods to identify 
the target diagnosis in a real-world ontology. Ontologies are ordered by the number of diagnoses. 
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1,00 
0,80 
0,60 
0,40 
0,20 
0,00 
-0,20 
-0,40 
-0,60 
-0,80 
-1,00 



ir J ^ J'~ 



40% 
20% 



-80% 
-100% 



Good Average 

I Average number of queries 



Bad 

Average time 



Figure 5: Average time/query gain resulting from the application of 
the extended CKK partitioning algorithm. The whiskers indicate the 
maximum and minimum possible gain of queries/time by using ex- 
tended CKK. 



All these measures are evaluated for each axiom in a 
conflict set. The scores are then combined in a rank 
value which is associated with the corresponding axiom. 
These ranks are then used by a modified HS-Tree algo- 
rithm that identifies diagnoses with a minimal rank. In 
this work no query generation and selection strategy is 
proposed if the target diagnosis cannot be determined 
reliably with the given a-priori knowledge. In our work 
additional information is acquired until the target diag- 
nosis can be identified with confidence. In general, the 
work of IfTOl can be combined with the one presented in 
this paper as axiom ranks can be taken into account to- 
gether with other observations for calculating the prior 
probabilities of the diagnoses. 

The idea of selecting the next best query based on the 
expected entropy was exploited in the generation of de- 
cisions trees |20| and further refined for selecting mea- 
surements in the model-based diagnosis of circuits lITSJI . 
We extended these methods to query selection in the do- 
main of ontology debugging. 

7. Conclusions 

In this paper we presented an approach to the se- 
quential diagnosis of ontologies. We showed that the 
axioms generated by classification and realization can 
be exploited to generate queries which differentiate be- 
tween diagnoses. To rank the utility of these queries 
we employ knowledge about typical user errors in on- 
tology axioms. Based on the likelihood of an ontology 
axiom to contain an error we predict the information 
gain produced by a query result, enabling us to select 
the next best query according to a one-step-lookahead 
entropy-based scoring function. We outlined the im- 
plementation of a sequential debugging algorithm and 
compared our proposed method with a "Split-in-half 



strategy. Our experiments showed a significant reduc- 
tion in the number of queries required to identify the 
target diagnosis. In addition, our evaluation employing 
real-word ontologies indicates that even a rough esti- 
mate of the prior probabilities of faults with a moder- 
ate variance allow the advantageous application of the 
entropy-based query selection. 
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